Search Results: "Matthew Garrett"

12 July 2022

Matthew Garrett: Responsible stewardship of the UEFI secure boot ecosystem

After I mentioned that Lenovo are now shipping laptops that only boot Windows by default, a few people pointed to a Lenovo document that says:

Starting in 2022 for Secured-core PCs it is a Microsoft requirement for the 3rd Party Certificate to be disabled by default.

"Secured-core" is a term used to describe machines that meet a certain set of Microsoft requirements around firmware security, and by and large it's a good thing - devices that meet these requirements are resilient against a whole bunch of potential attacks in the early boot process. But unfortunately the 2022 requirements don't seem to be publicly available, so it's difficult to know what's being asked for and why. But first, some background.

Most x86 UEFI systems that support Secure Boot trust at least two certificate authorities:

1) The Microsoft Windows Production PCA - this is used to sign the bootloader in production Windows builds. Trusting this is sufficient to boot Windows.
2) The Microsoft Corporation UEFI CA - this is used by Microsoft to sign non-Windows UEFI binaries, including built-in drivers for hardware that needs to work in the UEFI environment (such as GPUs and network cards) and bootloaders for non-Windows.

The apparent secured-core requirement for 2022 is that the second of these CAs should not be trusted by default. As a result, drivers or bootloaders signed with this certificate will not run on these systems. This means that, out of the box, these systems will not boot anything other than Windows[1].

Given the association with the secured-core requirements, this is presumably a security decision of some kind. Unfortunately, we have no real idea what this security decision is intended to protect against. The most likely scenario is concerns about the (in)security of binaries signed with the third-party signing key - there are some legitimate concerns here, but I'm going to cover why I don't think they're terribly realistic.

The first point is that, from a boot security perspective, a signed bootloader that will happily boot unsigned code kind of defeats the point. Kaspersky did it anyway. The second is that even a signed bootloader that is intended to only boot signed code may run into issues in the event of security vulnerabilities - the Boothole vulnerabilities are an example of this, covering multiple issues in GRUB that could allow for arbitrary code execution and potential loading of untrusted code.

So we know that signed bootloaders that will (either through accident or design) execute unsigned code exist. The signatures for all the known vulnerable bootloaders have been revoked, but that doesn't mean there won't be other vulnerabilities discovered in future. Configuring systems so that they don't trust the third-party CA means that those signed bootloaders won't be trusted, which means any future vulnerabilities will be irrelevant. This seems like a simple choice?

There's actually a couple of reasons why I don't think it's anywhere near that simple. The first is that whenever a signed object is booted by the firmware, the trusted certificate used to verify that object is measured into PCR 7 in the TPM. If a system previously booted with something signed with the Windows Production CA, and is now suddenly booting with something signed with the third-party UEFI CA, the values in PCR 7 will be different. TPMs support "sealing" a secret - encrypting it with a policy that the TPM will only decrypt it if certain conditions are met. Microsoft make use of this for their default Bitlocker disk encryption mechanism. The disk encryption key is encrypted by the TPM, and associated with a specific PCR 7 value. If the value of PCR 7 doesn't match, the TPM will refuse to decrypt the key, and the machine won't boot. This means that attempting to attack a Windows system that has Bitlocker enabled using a non-Windows bootloader will fail - the system will be unable to obtain the disk unlock key, which is a strong indication to the owner that they're being attacked.

The second is that this is predicated on the idea that removing the third-party bootloaders and drivers removes all the vulnerabilities. In fact, there's been rather a lot of vulnerabilities in the Windows bootloader. A broad enough vulnerability in the Windows bootloader is arguably a lot worse than a vulnerability in a third-party loader, since it won't change the PCR 7 measurements and the system will boot happily. Removing trust in the third-party CA does nothing to protect against this.

The third reason doesn't apply to all systems, but it does to many. System vendors frequently want to ship diagnostic or management utilities that run in the boot environment, but would prefer not to have to go to the trouble of getting them all signed by Microsoft. The simple solution to this is to ship their own certificate and sign all their tooling directly - the secured-core Lenovo I'm looking at currently is an example of this, with a Lenovo signing certificate. While everything signed with the third-party signing certificate goes through some degree of security review, there's no requirement for any vendor tooling to be reviewed at all. Removing the third-party CA does nothing to protect the user against the code that's most likely to contain vulnerabilities.

Obviously I may be missing something here - Microsoft may well have a strong technical justification. But they haven't shared it, and so right now we're left making guesses. And right now, I just don't see a good security argument.

But let's move on from the technical side of things and discuss the broader issue. The reason UEFI Secure Boot is present on most x86 systems is that Microsoft mandated it back in 2012. Microsoft chose to be the only trusted signing authority. Microsoft made the decision to assert that third-party code could be signed and trusted.

We've certainly learned some things since then, and a bunch of things have changed. Third-party bootloaders based on the Shim infrastructure are now reviewed via a community-managed process. We've had a productive coordinated response to the Boothole incident, which also taught us that the existing revocation strategy wasn't going to scale. In response, the community worked with Microsoft to develop a specification for making it easier to handle similar events in future. And it's also worth noting that after the initial Boothole disclosure was made to the GRUB maintainers, they proactively sought out other vulnerabilities in their codebase rather than simply patching what had been reported. The free software community has gone to great lengths to ensure third-party bootloaders are compatible with the security goals of UEFI Secure Boot.

So, to have Microsoft, the self-appointed steward of the UEFI Secure Boot ecosystem, turn round and say that a bunch of binaries that have been reviewed through processes developed in negotiation with Microsoft, implementing technologies designed to make management of revocation easier for Microsoft, and incorporating fixes for vulnerabilities discovered by the developers of those binaries who notified Microsoft of these issues despite having no obligation to do so, and which have then been signed by Microsoft are now considered by Microsoft to be insecure is, uh, kind of impolite? Especially when unreviewed vendor-signed binaries are still considered trustworthy, despite no external review being carried out at all.

If Microsoft had a set of criteria used to determine whether something is considered sufficiently trustworthy, we could determine which of these we fell short on and do something about that. From a technical perspective, Microsoft could set criteria that would allow a subset of third-party binaries that met additional review be trusted without having to trust all third-party binaries[2]. But, instead, this has been a decision made by the steward of this ecosystem without consulting major stakeholders.

If there are legitimate security concerns, let's talk about them and come up with solutions that fix them without doing a significant amount of collateral damage. Don't complain about a vendor blocking your apps and then do the same thing yourself.

[Edit to add: there seems to be some misunderstanding about where this restriction is being imposed. I bought this laptop because I'm interested in investigating the Microsoft Pluton security processor, but Pluton is not involved at all here. The restriction is being imposed by the firmware running on the main CPU, not any sort of functionality implemented on Pluton]

[1] They'll also refuse to run any drivers that are stored in flash on Thunderbolt devices, which means eGPU setups may be more complicated, as will netbooting off Thunderbolt-attached NICs
[2] Use a different leaf cert to sign the new trust tier, add the old leaf cert to dbx unless a config option is set, leave the existing intermediate in db

comments

8 July 2022

Matthew Garrett: Lenovo shipping new laptops that only boot Windows by default

I finally managed to get hold of a Thinkpad Z13 to examine a functional implementation of Microsoft's Pluton security co-processor. Trying to boot Linux from a USB stick failed out of the box for no obvious reason, but after further examination the cause became clear - the firmware defaults to not trusting bootloaders or drivers signed with the Microsoft 3rd Party UEFI CA key. This means that given the default firmware configuration, nothing other than Windows will boot. It also means that you won't be able to boot from any third-party external peripherals that are plugged in via Thunderbolt.

There's no security benefit to this. If you want security here you're paying attention to the values measured into the TPM, and thanks to Microsoft's own specification for measurements made into PCR 7, switching from booting Windows to booting something signed with the 3rd party signing key will change the measurements and invalidate any sealed secrets. It's trivial to detect this. Distrusting the 3rd party CA by default doesn't improve security, it just makes it harder for users to boot alternative operating systems.

Lenovo, this isn't OK. The entire architecture of UEFI secure boot is that it allows for security without compromising user choice of OS. Restricting boot to Windows by default provides no security benefit but makes it harder for people to run the OS they want to. Please fix it.

comments

16 May 2022

Matthew Garrett: Can we fix bearer tokens?

Last month I wrote about how bearer tokens are just awful, and a week later Github announced that someone had managed to exfiltrate bearer tokens from Heroku that gave them access to, well, a lot of Github repositories. This has inevitably resulted in a whole bunch of discussion about a number of things, but people seem to be largely ignoring the fundamental issue that maybe we just shouldn't have magical blobs that grant you access to basically everything even if you've copied them from a legitimate holder to Honest John's Totally Legitimate API Consumer.

To make it clearer what the problem is here, let's use an analogy. You have a safety deposit box. To gain access to it, you simply need to be able to open it with a key you were given. Anyone who turns up with the key can open the box and do whatever they want with the contents. Unfortunately, the key is extremely easy to copy - anyone who is able to get hold of your keyring for a moment is in a position to duplicate it, and then they have access to the box. Wouldn't it be better if something could be done to ensure that whoever showed up with a working key was someone who was actually authorised to have that key?

To achieve that we need some way to verify the identity of the person holding the key. In the physical world we have a range of ways to achieve this, from simply checking whether someone has a piece of ID that associates them with the safety deposit box all the way up to invasive biometric measurements that supposedly verify that they're definitely the same person. But computers don't have passports or fingerprints, so we need another way to identify them.

When you open a browser and try to connect to your bank, the bank's website provides a TLS certificate that lets your browser know that you're talking to your bank instead of someone pretending to be your bank. The spec allows this to be a bi-directional transaction - you can also prove your identity to the remote website. This is referred to as "mutual TLS", or mTLS, and a successful mTLS transaction ends up with both ends knowing who they're talking to, as long as they have a reason to trust the certificate they were presented with.

That's actually a pretty big constraint! We have a reasonable model for the server - it's something that's issued by a trusted third party and it's tied to the DNS name for the server in question. Clients don't tend to have stable DNS identity, and that makes the entire thing sort of awkward. But, thankfully, maybe we don't need to? We don't need the client to be able to prove its identity to arbitrary third party sites here - we just need the client to be able to prove it's a legitimate holder of whichever bearer token it's presenting to that site. And that's a much easier problem.

Here's the simple solution - clients generate a TLS cert. This can be self-signed, because all we want to do here is be able to verify whether the machine talking to us is the same one that had a token issued to it. The client contacts a service that's going to give it a bearer token. The service requests mTLS auth without being picky about the certificate that's presented. The service embeds a hash of that certificate in the token before handing it back to the client. Whenever the client presents that token to any other service, the service ensures that the mTLS cert the client presented matches the hash in the bearer token. Copy the token without copying the mTLS certificate and the token gets rejected. Hurrah hurrah hats for everyone.

Well except for the obvious problem that if you're in a position to exfiltrate the bearer tokens you can probably just steal the client certificates and keys as well, and now you can pretend to be the original client and this is not adding much additional security. Fortunately pretty much everything we care about has the ability to store the private half of an asymmetric key in hardware (TPMs on Linux and Windows systems, the Secure Enclave on Macs and iPhones, either a piece of magical hardware or Trustzone on Android) in a way that avoids anyone being able to just steal the key.

How do we know that the key is actually in hardware? Here's the fun bit - it doesn't matter. If you're issuing a bearer token to a system then you're already asserting that the system is trusted. If the system is lying to you about whether or not the key it's presenting is hardware-backed then you've already lost. If it lied and the system is later compromised then sure all your apes get stolen, but maybe don't run systems that lie and avoid that situation as a result?

Anyway. This is covered in RFC 8705 so why aren't we all doing this already? From the client side, the largest generic issue is that TPMs are astonishingly slow in comparison to doing a TLS handshake on the CPU. RSA signing operations on TPMs can take around half a second, which doesn't sound too bad, except your browser is probably establishing multiple TLS connections to subdomains on the site it's connecting to and performance is going to tank. Fixing this involves doing whatever's necessary to convince the browser to pipe everything over a single TLS connection, and that's just not really where the web is right at the moment. Using EC keys instead helps a lot (~0.1 seconds per signature on modern TPMs), but it's still going to be a bottleneck.

The other problem, of course, is that ecosystem support for hardware-backed certificates is just awful. Windows lets you stick them into the standard platform certificate store, but the docs for this are hidden in a random PDF in a Github repo. Macs require you to do some weird bridging between the Secure Enclave API and the keychain API. Linux? Well, the standard answer is to do PKCS#11, and I have literally never met anybody who likes PKCS#11 and I have spent a bunch of time in standards meetings with the sort of people you might expect to like PKCS#11 and even they don't like it. It turns out that loading a bunch of random C bullshit that has strong feelings about function pointers into your security critical process is not necessarily something that is going to improve your quality of life, so instead you should use something like this and just have enough C to bridge to a language that isn't secretly plotting to kill your pets the moment you turn your back.

And, uh, obviously none of this matters at all unless people actually support it. Github has no support at all for validating the identity of whoever holds a bearer token. Most issuers of bearer tokens have no support for embedding holder identity into the token. This is not good! As of last week, all three of the big cloud providers support virtualised TPMs in their VMs - we should be running CI on systems that can do that, and tying any issued tokens to the VMs that are supposed to be making use of them.

So sure this isn't trivial. But it's also not impossible, and making this stuff work would improve the security of, well, everything. We literally have the technology to prevent attacks like Github suffered. What do we have to do to get people to actually start working on implementing that?

comments

19 April 2022

Steve McIntyre: Firmware - what are we going to do about it?

TL;DR: firmware support in Debian sucks, and we need to change this. See the "My preference, and rationale" Section below. In my opinion, the way we deal with (non-free) firmware in Debian is a mess, and this is hurting many of our users daily. For a long time we've been pretending that supporting and including (non-free) firmware on Debian systems is not necessary. We don't want to have to provide (non-free) firmware to our users, and in an ideal world we wouldn't need to. However, it's very clearly no longer a sensible path when trying to support lots of common current hardware. Background - why has (non-free) firmware become an issue? Firmware is the low-level software that's designed to make hardware devices work. Firmware is tightly coupled to the hardware, exposing its features, providing higher-level functionality and interfaces for other software to use. For a variety of reasons, it's typically not Free Software. For Debian's purposes, we typically separate firmware from software by considering where the code executes (does it run on a separate processor? Is it visible to the host OS?) but it can be difficult to define a single reliable dividing line here. Consider the Intel/AMD CPU microcode packages, or the U-Boot firmware packages as examples. In times past, all necessary firmware would normally be included directly in devices / expansion cards by their vendors. Over time, however, it has become more and more attractive (and therefore more common) for device manufacturers to not include complete firmware on all devices. Instead, some devices just embed a very simple set of firmware that allows for upload of a more complete firmware "blob" into memory. Device drivers are then expected to provide that blob during device initialisation. There are a couple of key drivers for this change:

Cost: it's typically cheaper to fit smaller flash memory (or no flash at all) onto a device. The cost difference may seem small in many cases, but reducing the bill of materials (BOM) even by a few cents can make a substantial difference to the economics of a product. For most vendors, they will have to implement device drivers anyway and it's not difficult to include firmware in that driver.
Flexibility: it's much easier to change the behaviour of a device by simply changing to a different blob. This can potentially cover lots of different use cases:
- separating deadlines for hardware and software in manufacturing (drivers and firmware can be written and shipped later);
- bug fixes and security updates (e.g. CPU microcode changes);
- changing configuration of a device for different users or products (e.g. potentially different firmware to enable different frequencies on a radio product);
- changing fundamental device operation (e.g. switching between RAID and JBOD functionality on a disk controller).

Due to these reasons, more and more devices in a typical computer now need firmware to be uploaded at runtime for them to function correctly. This has grown:

Going back 10 years or so, most computers only needed firmware uploads to make WiFi hardware work.
A growing number of wired network adapters now demand firmware uploads. Some will work in a limited way but depend on extra firmware to allow advanced features like TCP segmentation offload (TSO). Others will refuse to work at all without a firmware upload.
More and more graphics adapters now also want firmware uploads to provide any non-basic functions. A simple basic (S)VGA-compatible framebuffer is not enough for most users these days; modern desktops expect 3D acceleration, and a lot of current hardware will not provide that without extra firmware.
Current generations of common Intel-based laptops also need firmware uploads to make audio work (see the firmware-sof-signed package).

At the beginning of this timeline, a typical Debian user would be able to use almost all of their computer's hardware without needing any firmware blobs. It might have been inconvenient to not be able to use the WiFi, but most laptops had wired ethernet anyway. The WiFi could always be enabled and configured after installation. Today, a user with a new laptop from most vendors will struggle to use it at all with our firmware-free Debian installation media. Modern laptops normally don't come with wired ethernet now. There won't be any usable graphics on the laptop's screen. A visually-impaired user won't get any audio prompts. These experiences are not acceptable, by any measure. There are new computers still available for purchase today which don't need firmware to be uploaded, but they are growing less and less common. Current state of firmware in Debian For clarity: obviously not all devices need extra firmware uploading like this. There are many devices that depend on firmware for operation, but we never have to think about them in normal circumstances. The code is not likely to be Free Software, but it's not something that we in Debian must spend our time on as we're not distributing that code ourselves. Our problems come when our user needs extra firmware to make their computer work, and they need/expect us to provide it. We have a small set of Free firmware binaries included in Debian main, and these are included on our installation and live media. This is great - we all love Free Software and this works. However, there are many more firmware binaries that are not Free. If we are legally able to redistribute those binaries, we package them up and include them in the non-free section of the archive. As Free Software developers, we don't like providing or supporting non-free software for our users, but we acknowledge that it's sometimes a necessary thing for them. This tension is acknowledged in the Debian Free Software Guidelines. This tension extends to our installation and live media. As non-free is officially not considered part of Debian, our official media cannot include anything from non-free. This has been a deliberate policy for many years. Instead, we have for some time been building a limited parallel set of "unofficial non-free" images which include non-free firmware. These non-free images are produced by the same software that we use for the official images, and by the same team. There are a number of issues here that make developers and users unhappy:

Building, testing and publishing two sets of images takes more effort.
We don't really want to be providing non-free images at all, from a philosophy point of view. So we mainly promote and advertise the preferred official free images. That can be a cause of confusion for users. We do link to the non-free images in various places, but they're not so easy to find.
Using non-free installation media will cause more installations to use non-free software by default. That's not a great story for us, and we may end up with more of our users using non-free software and believing that it's all part of Debian.
A number of users and developers complain that we're wasting their time by publishing official images that are just not useful for a lot (a majority?) of users.

We should do better than this. Options The status quo is a mess, and I believe we can and should do things differently. I see several possible options that the images team can choose from here. However, several of these options could undermine the principles of Debian. We don't want to make fundamental changes like that without the clear backing of the wider project. That's why I'm writing this...

Keep the existing setup. It's horrible, but maybe it's the best we can do? (I hope not!)
We could just stop providing the non-free unofficial images altogether. That's not really a promising route to follow - we'd be making it even harder for users to install our software. While ideologically pure, it's not going to advance the cause of Free Software.
We could stop pretending that the non-free images are unofficial, and maybe move them alongside the normal free images so they're published together. This would make them easier to find for people that need them, but is likely to cause users to question why we still make any images without firmware if they're otherwise identical.
The images team technically could simply include non-free into the official images, and add firmware packages to the input lists for those images. However, that would still leave us with problem 3 from above (non-free generally enabled on most installations).
We could split out the non-free firmware packages into a new non-free-firmware component in the archive, and allow a specific exception only to allow inclusion of those packages on our official media. We would then generate only one set of official media, including those non-free firmware packages. (We've already seen various suggestions in recent years to split up the non-free component of the archive like this, for example into non-free-firmware, non-free-doc, non-free-drivers, etc. Disagreement (bike-shedding?) about the split caused us to not make any progress on this. I believe this project should be picked up and completed. We don't have to make a perfect solution here immediately, just something that works well enough for our needs today. We can always tweak and improve the setup incrementally if that's needed.)

These are the most likely possible options, in my opinion. If you have a better suggestion, please let us know! I'd like to take this set of options to a GR, and do it soon. I want to get a clear decision from the wider Debian project as to how to organise firmware and installation images. If we do end up changing how we do things, I want a clear mandate from the project to do that. My preference, and rationale Mainly, I want to see how the project as a whole feels here - this is a big issue that we're overdue solving. What would I choose to do? My personal preference would be to go with option 5: split the non-free firmware into a special new component and include that on official media. Does that make me a sellout? I don't think so. I've been passionately supporting and developing Free Software for more than half my life. My philosophy here has not changed. However, this is a complex and nuanced situation. I firmly believe that sharing software freedom with our users comes with a responsibility to also make our software useful. If users can't easily install and use Debian, that helps nobody. By splitting things out here, we would enable users to install and use Debian on their hardware, without promoting/pushing higher-level non-free software in general. I think that's a reasonable compromise. This is simply a change to recognise that hardware requirements have moved on over the years. Further work If we do go with the changes in option 5, there are other things we could do here for better control of and information about non-free firmware:

Along with adding non-free firmware onto media, when the installer (or live image) runs, we should make it clear exactly which firmware packages have been used/installed to support detected hardware. We could link to docs about each, and maybe also to projects working on Free re-implementations.
Add an option at boot to explicitly disable the use of the non-free firmware packages, so that users can choose to avoid them.

Acknowledgements Thanks to people who reviewed earlier versions of this document and/or made suggestions for improvement, in particular:

Cyril Brulebois
Matthew Garrett
David Leggett
Martin Michlmayr
Andy Simpkins
Neil Williams

17 April 2022

Matthew Garrett: The Freedom Phone is not great at privacy

The Freedom Phone advertises itself as a "Free speech and privacy first focused phone". As documented on the features page, it runs ClearOS, an Android-based OS produced by Clear United (or maybe one of the bewildering array of associated companies, we'll come back to that later). It's advertised as including Signal, but what's shipped is not the version available from the Signal website or any official app store - instead it's this fork called "ClearSignal".

The first thing to note about ClearSignal is that the privacy policy link from that page 404s, which is not a great start. The second thing is that it has a version number of 5.8.14, which is strange because upstream went from 5.8.10 to 5.9.0. The third is that, despite Signal being GPL 3, there's no source code available. So, I grabbed jadx and started looking for differences between ClearSignal and the upstream 5.8.10 release. The results were, uh, surprising.

First up is that they seem to have integrated ACRA, a crash reporting framework. This feels a little odd - in the absence of a privacy policy, it's unclear what information this gathers or how it'll be stored. Having a piece of privacy software automatically uploading information about what you were doing in the event of a crash with no notification other than a toast that appears saying "Crash Report" feels a little dubious.

Next is that Signal (for fairly obvious reasons) warns you if your version is out of date and eventually refuses to work unless you upgrade. ClearSignal has dealt with this problem by, uh, simply removing that code. The MacOS version of the desktop app they provide for download seems to be derived from a release from last September, which for an Electron-based app feels like a pretty terrible idea. Weirdly, for Windows they link to an official binary release from February 2021, and for Linux they tell you how to use the upstream repo properly. I have no idea what's going on here.

They've also added support for network backups of your Signal data. This involves the backups being pushed to an S3 bucket using credentials that are statically available in the app. It's ok, though, each upload has some sort of nominally unique identifier associated with it, so it's not trivial to just download other people's backups. But, uh, where does this identifier come from? It turns out that Clear Center, another of the Clear family of companies, employs a bunch of people to work on a ClearID[1], some sort of decentralised something or other that seems to be based on KERI. There's an overview slide deck here which didn't really answer any of my questions and as far as I can tell this is entirely lacking any sort of peer review, but hey it's only the one thing that stops anyone on the internet being able to grab your Signal backups so how important can it be.

The final thing, though? They've extended Signal's invitation support to encourage users to get others to sign up for Clear United. There's an exposed API endpoint called "get_user_email_by_mobile_number" which does exactly what you'd expect - if you give it a registered phone number, it gives you back the associated email address. This requires no authentication. But it gets better! The API to generate a referral link to send to others sends the name and phone number of everyone in your phone's contact list. There does not appear to be any indication that this is going to happen.

So, from a privacy perspective, going to go with things being some distance from ideal. But what's going on with all these Clear companies anyway? They all seem to be related to Michael Proper, who founded the Clear Foundation in 2009. They are, perhaps unsurprisingly, heavily invested in blockchain stuff, while Clear United also appears to be some sort of multi-level marketing scheme which has a membership agreement that includes the somewhat astonishing claim that:

Specifically, the initial focus of the Association will provide members with supplements and technologies for:

9a. Frequency Evaluation, Scans, Reports;

9b. Remote Frequency Health Tuning through Quantum Entanglement;

9c. General and Customized Frequency Optimizations;

- there's more discussion of this and other weirdness here. Clear Center, meanwhile, has a Chief Physics Officer? I have a lot of questions.

Anyway. We have a company that seems to be combining blockchain and MLM, has some opinions about Quantum Entanglement, bases the security of its platform on a set of novel cryptographic primitives that seem to have had no external review, has implemented an API that just hands out personal information without any authentication and an app that appears more than happy to upload all your contact details without telling you first, has failed to update this app to keep up with upstream security updates, and is violating the upstream license. If this is their idea of "privacy first", I really hate to think what their code looks like when privacy comes further down the list.

[1] Pointed out to me here

comments

5 April 2022

Matthew Garrett: Bearer tokens are just awful

As I mentioned last time, bearer tokens are not super compatible with a model in which every access is verified to ensure it's coming from a trusted device. Let's talk about that in a bit more detail.

First off, what is a bearer token? In its simplest form, it's simply an opaque blob that you give to a user after an authentication or authorisation challenge, and then they show it to you to prove that they should be allowed access to a resource. In theory you could just hand someone a randomly generated blob, but then you'd need to keep track of which blobs you've issued and when they should be expired and who they correspond to, so frequently this is actually done using JWTs which contain some base64 encoded JSON that describes the user and group membership and so on and then have a signature associated with them so whenever the user presents one you can just validate the signature and then assume that the contents of the JSON are trustworthy.

One thing to note here is that the crypto is purely between whoever issued the token and whoever validates the token - as far as the server is concerned, any client who can just show it the token is just fine as long as the signature is verified. There's no way to verify the client's state, so one of the core ideas of Zero Trust (that we verify that the client is in a trustworthy state on every access) is already violated.

Can we make things not terrible? Sure! We may not be able to validate the client state on every access, but we can validate the client state when we issue the token in the first place. When the user hits a login page, we do state validation according to whatever policy we want to enforce, and if the client violates that policy we refuse to issue a token to it. If the token has a sufficiently short lifetime then an attacker is only going to have a short period of time to use that token before it expires and then (with luck) they won't be able to get a new one because the state validation will fail.

Except! This is fine for cases where we control the issuance flow. What if we have a scenario where a third party authenticates the client (by verifying that they have a valid token issued by their ID provider) and then uses that to issue their own token that's much longer lived? Well, now the client has a long-lived token sitting on it. And if anyone copies that token to another device, they can now pretend to be that client.

This is, sadly, depressingly common. A lot of services will verify the user, and then issue an oauth token that'll expire some time around the heat death of the universe. If a client system is compromised and an attacker just copies that token to another system, they can continue to pretend to be the legitimate user until someone notices (which, depending on whether or not the service in question has any sort of audit logs, and whether you're paying any attention to them, may be once screenshots of your data show up on Twitter).

This is a problem! There's no way to fit a hosted service that behaves this way into a Zero Trust model - the best you can say is that a token was issued to a device that was, around that time, apparently trustworthy, and now it's some time later and you have literally no idea whether the device is still trustworthy or if the token is still even on that device.

But wait, there's more! Even if you're nowhere near doing any sort of Zero Trust stuff, imagine the case of a user having a bunch of tokens from multiple services on their laptop, and then they leave their laptop unlocked in a cafe while they head to the toilet and whoops it's not there any more, better assume that someone has access to all the data on there. How many services has our opportunistic new laptop owner gained access to as a result? How do we revoke all of the tokens that are sitting there on the local disk? Do you even have a policy for dealing with that?

There isn't a simple answer to all of these problems. Replacing bearer tokens with some sort of asymmetric cryptographic challenge to the client would at least let us tie the tokens to a TPM or other secure enclave, and then we wouldn't have to worry about them being copied elsewhere. But that wouldn't help us if the client is compromised and the attacker simply keeps using the compromised client. The entire model of simply proving knowledge of a secret being sufficient to gain access to a resource is inherently incompatible with a desire for fine-grained trust verification on every access, but I don't see anything changing until we have a standard for third party services to be able to perform that trust verification against a customer's policy.

Still, at least this means I can just run weird Android IoT apps through mitmproxy, pull the bearer token out of the request headers and then start poking the remote API with curl. It may all be broken, but it's also got me a bunch of bug bounty credit, so, it;s impossible to say if its bad or not,

(Addendum: this suggestion that we solve the hardware binding problem by simply passing all the network traffic through some sort of local enclave that could see tokens being set and would then sequester them and reinject them into later requests is OBVIOUSLY HORRIFYING and is also probably going to be at least three startup pitches by the end of next week)

comments

31 March 2022

Matthew Garrett: ZTA doesn't solve all problems, but partial implementations solve fewer

Traditional network access controls work by assuming that something is trustworthy based on some other factor - for example, if a computer is on your office network, it's trustworthy because only trustworthy people should be able to gain physical access to plug something in. If you restrict access to your services to requests coming from trusted networks, then you can assert that it's coming from a trusted device.

Of course, this isn't necessarily true. A machine on your office network may be compromised. An attacker may obtain valid VPN credentials. Someone could leave a hostile device plugged in under a desk in a meeting room. Trust is being placed in devices that may not be trustworthy.

A Zero Trust Architecture (ZTA) is one where a device is granted no inherent trust. Instead, each access to a service is validated against some policy - if the policy is satisfied, the access is permitted. A typical implementation involves granting each device some sort of cryptographic identity (typically a TLS client certificate) and placing the protected services behind a proxy. The proxy verifies the device identity, queries another service to obtain the current device state (we'll come back to that in a moment), compares the state against a policy and either pass the request through to the service or reject it. Different services can have different policies (eg, you probably want a lax policy around whatever's hosting the documentation for how to fix your system if it's being refused access to something for being in the wrong state), and if you want you can also tie it to proof of user identity in some way.

From a user perspective, this is entirely transparent. The proxy is made available on the public internet, DNS for the services points to the proxy, and every time your users try to access the service they hit the proxy instead and (if everything's ok) gain access to it no matter which network they're on. There's no need to connect to a VPN first, and there's no worries about accidentally leaking information over the public internet instead of over a secure link.

It's also notable that traditional solutions tend to be all-or-nothing. If I have some services that are more sensitive than others, the only way I can really enforce this is by having multiple different VPNs and only granting access to sensitive services from specific VPNs. This obviously risks combinatorial explosion once I have more than a couple of policies, and it's a terrible user experience.

Overall, ZTA approaches provide more security and an improved user experience. So why are we still using VPNs? Primarily because this is all extremely difficult. Let's take a look at an extremely recent scenario. A device used by customer support technicians was compromised. The vendor in question has a solution that can tie authentication decisions to whether or not a device has a cryptographic identity. If this was in use, and if the cryptographic identity was tied to the device hardware (eg, by being generated in a TPM), the attacker would not simply be able to obtain the user credentials and log in from their own device. This is good - if the attacker wanted to maintain access to the service, they needed to stay on the device in question. This increases the probability of the monitoring tooling on the compromised device noticing them.

Unfortunately, the attacker simply disabled the monitoring tooling on the compromised device. If device state was being verified on each access then this would be noticed before too long - the last data received from the device would be flagged as too old, and the requests would no longer satisfy any reasonable access control policy. Instead, the device was assumed to be trustworthy simply because it could demonstrate its identity. There's an important point here: just because a device belongs to you doesn't mean it's a trustworthy device.

So, if ZTA approaches are so powerful and user-friendly, why aren't we all using one? There's a few problems, but the single biggest is that there's no standardised way to verify device state in any meaningful way. Remote Attestation can both prove device identity and the device boot state, but the only product on the market that does much with this is Microsoft's Device Health Attestation. DHA doesn't solve the broader problem of also reporting runtime state - it may be able to verify that endpoint monitoring was launched, but it doesn't make assertions about whether it's still running. Right now, people are left trying to scrape this information from whatever tooling they're running. The absence of any standardised approach to this problem means anyone who wants to deploy a strong ZTA has to integrate with whatever tooling they're already running, and that then increases the cost of migrating to any other tooling later.

But even device identity is hard! Knowing whether a machine should be given a certificate or not depends on knowing whether or not you own it, and inventory control is a surprisingly difficult problem in a lot of environments. It's not even just a matter of whether a machine should be given a certificate in the first place - if a machine is reported as lost or stolen, its trust should be revoked. Your inventory system needs to tie into your device state store in order to ensure that your proxies drop access.

And, worse, all of this depends on you being able to put stuff behind a proxy in the first place! If you're using third-party hosted services, that's a problem. In the absence of a proxy, trust decisions are probably made at login time. It's possible to tie user auth decisions to device identity and state (eg, a self-hosted SAML endpoint could do that before passing through to the actual ID provider), but that's still going to end up providing a bearer token of some sort that can potentially be exfiltrated, and will continue to be trusted even if the device state becomes invalid.

ZTA doesn't solve all problems, and there isn't a clear path to it doing so without significantly greater industry support. But a complete ZTA solution is significantly more powerful than a partial one. Verifying device identity is a step on the path to ZTA, but in the absence of device state verification it's only a step.

comments

23 March 2022

Matthew Garrett: AMD's Pluton implementation seems to be controllable

I've been digging through the firmware for an AMD laptop with a Ryzen 6000 that incorporates Pluton for the past couple of weeks, and I've got some rough conclusions. Note that these are extremely preliminary and may not be accurate, but I'm going to try to encourage others to look into this in more detail. For those of you at home, I'm using an image from here, specifically version 309. The installer is happy to run under Wine, and if you tell it to "Extract" rather than "Install" it'll leave a file sitting in C:\\DRIVERS\ASUS_GA402RK_309_BIOS_Update_20220322235241 which seems to have an additional 2K of header on it. Strip that and you should have something approximating a flash image.

Looking for UTF16 strings in this reveals something interesting:

Pluton (HSP) X86 Firmware Support Enable/Disable X86 firmware HSP related code path, including AGESA HSP module, SBIOS HSP related drivers. Auto - Depends on PcdAmdHspCoreEnable build value NOTE: PSP directory entry 0xB BIT36 have the highest priority. NOTE: This option will NOT put HSP hardware in disable state, to disable HSP hardware, you need setup PSP directory entry 0xB, BIT36 to 1. // EntryValue[36] = 0: Enable, HSP core is enabled. // EntryValue[36] = 1: Disable, HSP core is disabled then PSP will gate the HSP clock, no further PSP to HSP commands. System will boot without HSP.
"HSP" here means "Hardware Security Processor" - a generic term that refers to Pluton in this case. This is a configuration setting that determines whether Pluton is "enabled" or not - my interpretation of this is that it doesn't directly influence Pluton, but disables all mechanisms that would allow the OS to communicate with it. In this scenario, Pluton has its firmware loaded and could conceivably be functional if the OS knew how to speak to it directly, but the firmware will never speak to it itself. I took a quick look at the Windows drivers for Pluton and it looks like they won't do anything unless the firmware wants to expose Pluton, so this should mean that Windows will do nothing.

So what about the reference to "PSP directory entry 0xB BIT36 have the highest priority"? The PSP is the AMD Platform Security Processor - it's an ARM core on the CPU package that boots before the x86. The PSP firmware lives in the same flash image as the x86 firmware, so the PSP looks for a header that points it towards the firmware it should execute. This gives a pointer to a "directory" - a list of different object types and where they're located in flash (there's a description of this for slightly older AMDs here). Type 0xb is treated slightly specially. Where most types contain the address of where the actual object is, type 0xb contains a 64-bit value that's interpreted as enabling or disabling various features - something AMD calls "soft fusing" (Intel have something similar that involves setting bits in the Firmware Interface Table). The PSP looks at the bits that are set here and alters its behaviour. If bit 36 is set, the PSP tells Pluton to turn itself off and will no longer send any commands to it.

So, we have two mechanisms to disable Pluton - the PSP can tell it to turn itself off, or the x86 firmware can simply never speak to it or admit that it exists. Both of these imply that Pluton has started executing before it's shut down, so it's reasonable to wonder whether it can still do stuff. In the image I'm looking at, there's a blob starting at 0x0069b610 that appears to be firmware for Pluton - it contains chunks that appear to be the reference TPM2 implementation, and it broadly decompiles as valid ARM code. It should be viable to figure out whether it can do anything in the face of being "disabled" via either of the above mechanisms.

Unfortunately for me, the system I'm looking at does set bit 36 in the 0xb entry - as a result, Pluton is disabled before x86 code starts running and I can't investigate further in any straightforward way. The implication that the user-controllable mechanism for disabling Pluton merely disables x86 communication with it rather than turning it off entirely is a little concerning, although (assuming Pluton is behaving as a TPM rather than having an enhanced set of capabilities) skipping any firmware communication means the OS has no way to know what happened before it started running even if it has a mechanism to communicate with Pluton without firmware assistance. In that scenario it'd be viable to write a bootloader shim that just faked up the firmware measurements before handing control to the OS.

The bit 36 disabling mechanism seems more solid? Again, it should be possible to analyse the Pluton firmware to determine whether it actually pays attention to a disable command being sent. But even if it chooses to ignore that, if the PSP is in a position to just cut the clock to Pluton, it's not going to be able to do a lot. At that point we're trusting AMD rather than trusting Microsoft, but given that you're also trusting AMD to execute the code you're giving them to execute, it's hard to avoid placing trust in them.

Overall: I'm reasonably confident that systems that ship with Pluton disabled via setting bit 36 in the soft fuses are going to disable it sufficiently hard that the OS can't do anything about it. Systems that give the user an option to enable or disable it are a little less clear in that respect, and it's possible (but not yet demonstrated) that an OS could communicate with Pluton anyway. However, if that's true, and if the firmware never communicates with Pluton itself, the user could install a stub loader in UEFI that mimicks the firmware behaviour and leaves the OS thinking everything was good when it absolutely is not.

So, assuming that Pluton in its current form on AMD has no capabilities outside those we know about, the disabling mechanisms are probably good enough. It's tough to make a firm statement on this before I have access to a system that doesn't just disable it immediately, so stay tuned for updates.

comments

17 January 2022

Matthew Garrett: Boot Guard and PSB have user-hostile defaults

Compromising an OS without it being detectable is hard. Modern operating systems support the imposition of a security policy or the launch of some sort of monitoring agent sufficient early in boot that even if you compromise the OS, you're probably going to have left some sort of detectable trace[1]. You can avoid this by attacking the lower layers - if you compromise the bootloader then it can just hotpatch a backdoor into the kernel before executing it, for instance.

This is avoided via one of two mechanisms. Measured boot (such as TPM-based Trusted Boot) makes a tamper-proof cryptographic record of what the system booted, with each component in turn creating a measurement of the next component in the boot chain. If a component is tampered with, its measurement will be different. This can be used to either prevent the release of a cryptographic secret if the boot chain is modified (for instance, using the TPM to encrypt the disk encryption key), or can be used to attest the boot state to another device which can tell you whether you're safe or not. The other approach is verified boot (such as UEFI Secure Boot), where each component in the boot chain verifies the next component before executing it. If the verification fails, execution halts.

In both cases, each component in the boot chain measures and/or verifies the next. But something needs to be the first link in this chain, and traditionally this was the system firmware. Which means you could tamper with the system firmware and subvert the entire process - either have the firmware patch the bootloader in RAM after measuring or verifying it, or just load a modified bootloader and lie about the measurements or ignore the verification. Attackers had already been targeting the firmware (Hacking Team had something along these lines, although this was pre-secure boot so just dropped a rootkit into the OS), and given a well-implemented measured and verified boot chain, the firmware becomes an even more attractive target.

Intel's Boot Guard and AMD's Platform Secure Boot attempt to solve this problem by moving the validation of the core system firmware to an (approximately) immutable environment. Intel's solution involves the Management Engine, a separate x86 core integrated into the motherboard chipset. The ME's boot ROM verifies a signature on its firmware before executing it, and once the ME is up it verifies that the system firmware's bootblock is signed using a public key that corresponds to a hash blown into one-time programmable fuses in the chipset. What happens next depends on policy - it can either prevent the system from booting, allow the system to boot to recover the firmware but automatically shut it down after a while, or flag the failure but allow the system to boot anyway. Most policies will also involve a measurement of the bootblock being pushed into the TPM.

AMD's Platform Secure Boot is slightly different. Rather than the root of trust living in the motherboard chipset, it's in AMD's Platform Security Processor which is incorporated directly onto the CPU die. Similar to Boot Guard, the PSP has ROM that verifies the PSP's own firmware, and then that firmware verifies the system firmware signature against a set of blown fuses in the CPU. If that fails, system boot is halted. I'm having trouble finding decent technical documentation about PSB, and what I have found doesn't mention measuring anything into the TPM - if this is the case, PSB only implements verified boot, not measured boot.

What's the practical upshot of this? The first is that you can't replace the system firmware with anything that doesn't have a valid signature, which effectively means you're locked into firmware the vendor chooses to sign. This prevents replacing the system firmware with either a replacement implementation (such as Coreboot) or a modified version of the original implementation (such as firmware that disables locking of CPU functionality or removes hardware allowlists). In this respect, enforcing system firmware verification works against the user rather than benefiting them.
Of course, it also prevents an attacker from doing the same thing, but while this is a real threat to some users, I think it's hard to say that it's a realistic threat for most users.

The problem is that vendors are shipping with Boot Guard and (increasingly) PSB enabled by default. In the AMD case this causes another problem - because the fuses are in the CPU itself, a CPU that's had PSB enabled is no longer compatible with any motherboards running firmware that wasn't signed with the same key. If a user wants to upgrade their system's CPU, they're effectively unable to sell the old one. But in both scenarios, the user's ability to control what their system is running is reduced.

As I said, the threat that these technologies seek to protect against is real. If you're a large company that handles a lot of sensitive data, you should probably worry about it. If you're a journalist or an activist dealing with governments that have a track record of targeting people like you, it should probably be part of your threat model. But otherwise, the probability of you being hit by a purely userland attack is so ludicrously high compared to you being targeted this way that it's just not a big deal.

I think there's a more reasonable tradeoff than where we've ended up. Tying things like disk encryption secrets to TPM state means that if the system firmware is measured into the TPM prior to being executed, we can at least detect that the firmware has been tampered with. In this case nothing prevents the firmware being modified, there's just a record in your TPM that it's no longer the same as it was when you encrypted the secret. So, here's what I'd suggest:

1) The default behaviour of technologies like Boot Guard or PSB should be to measure the firmware signing key and whether the firmware has a valid signature into PCR 7 (the TPM register that is also used to record which UEFI Secure Boot signing key is used to verify the bootloader).
2) If the PCR 7 value changes, the disk encryption key release will be blocked, and the user will be redirected to a key recovery process. This should include remote attestation, allowing the user to be informed that their firmware signing situation has changed.
3) Tooling should be provided to switch the policy from merely measuring to verifying, and users at meaningful risk of firmware-based attacks should be encouraged to make use of this tooling

This would allow users to replace their system firmware at will, at the cost of having to re-seal their disk encryption keys against the new TPM measurements. It would provide enough information that, in the (unlikely for most users) scenario that their firmware has actually been modified without their knowledge, they can identify that. And it would allow users who are at high risk to switch to a higher security state, and for hardware that is explicitly intended to be resilient against attacks to have different defaults.

This is frustratingly close to possible with Boot Guard, but I don't think it's quite there. Before you've blown the Boot Guard fuses, the Boot Guard policy can be read out of flash. This means that you can drop a Boot Guard configuration into flash telling the ME to measure the firmware but not prevent it from running. But there are two problems remaining:

1) The measurement is made into PCR 0, and PCR 0 changes every time your firmware is updated. That makes it a bad default for sealing encryption keys.
2) It doesn't look like the policy is measured before being enforced. This means that an attacker can simply reflash modified firmware with a policy that disables measurement and then make a fake measurement that makes it look like the firmware is ok.

Fixing this seems simple enough - the Boot Guard policy should always be measured, and measurements of the policy and the signing key should be made into a PCR other than PCR 0. If an attacker modified the policy, the PCR value would change. If an attacker modified the firmware without modifying the policy, the PCR value would also change. People who are at high risk would run an app that would blow the Boot Guard policy into fuses rather than just relying on the copy in flash, and enable verification as well as measurement. Now if an attacker tampers with the firmware, the system simply refuses to boot and the attacker doesn't get anything.

Things are harder on the AMD side. I can't find any indication that PSB supports measuring the firmware at all, which obviously makes this approach impossible. I'm somewhat surprised by that, and so wouldn't be surprised if it does do a measurement somewhere. If it doesn't, there's a rather more significant problem - if a system has a socketed CPU, and someone has sufficient physical access to replace the firmware, they can just swap out the CPU as well with one that doesn't have PSB enabled. Under normal circumstances the system firmware can detect this and prompt the user, but given that the attacker has just replaced the firmware we can assume that they'd do so with firmware that doesn't decide to tell the user what just happened. In the absence of better documentation, it's extremely hard to say that PSB actually provides meaningful security benefits.

So, overall: I think Boot Guard protects against a real-world attack that matters to a small but important set of targets. I think most of its benefits could be provided in a way that still gave users control over their system firmware, while also permitting high-risk targets to opt-in to stronger guarantees. Based on what's publicly documented about PSB, it's hard to say that it provides real-world security benefits for anyone at present. In both cases, what's actually shipping reduces the control people have over their systems, and should be considered user-hostile.

[1] Assuming that someone's both turning this on and actually looking at the data produced

comments

9 January 2022

Matthew Garrett: Pluton is not (currently) a threat to software freedom

At CES this week, Lenovo announced that their new Z-series laptops would ship with AMD processors that incorporate Microsoft's Pluton security chip. There's a fair degree of cynicism around whether Microsoft have the interests of the industry as a whole at heart or not, so unsurprisingly people have voiced concerns about Pluton allowing for platform lock-in and future devices no longer booting non-Windows operating systems. Based on what we currently know, I think those concerns are understandable but misplaced.

But first it's helpful to know what Pluton actually is, and that's hard because Microsoft haven't actually provided much in the way of technical detail. The best I've found is a discussion of Pluton in the context of Azure Sphere, Microsoft's IoT security platform. This, in association with the block diagrams on page 12 and 13 of this slidedeck, suggest that Pluton is a general purpose security processor in a similar vein to Google's Titan chip. It has a relatively low powered CPU core, an RNG, and various hardware cryptography engines - there's nothing terribly surprising here, and it's pretty much the same set of components that you'd find in a standard Trusted Platform Module of the sort shipped in pretty much every modern x86 PC. But unlike Titan, Pluton seems to have been designed with the explicit goal of being incorporated into other chips, rather than being a standalone component. In the Azure Sphere case, we see it directly incorporated into a Mediatek chip. In the Xbox Series devices, it's incorporated into the SoC. And now, we're seeing it arrive on general purpose AMD CPUs.

Microsoft's announcement says that Pluton can be shipped in three configurations:as the Trusted Platform Module; as a security processor used for non-TPM scenarios like platform resiliency; or OEMs can choose to ship with Pluton turned off. What we're likely to see to begin with is the former - Pluton will run firmware that exposes a Trusted Computing Group compatible TPM interface. This is almost identical to the status quo. Microsoft have required that all Windows certified hardware ship with a TPM for years now, but for cost reasons this is often not in the form of a separate hardware component. Instead, both Intel and AMD provide support for running the TPM stack on a component separate from the main execution cores on the system - for Intel, this TPM code runs on the Management Engine integrated into the chipset, and for AMD on the Platform Security Processor that's integrated into the CPU package itself.

So in this respect, Pluton changes very little; the only difference is that the TPM code is running on hardware dedicated to that purpose, rather than alongside other code. Importantly, in this mode Pluton will not do anything unless the system firmware or OS ask it to. Pluton cannot independently block the execution of any other code - it knows nothing about the code the CPU is executing unless explicitly told about it. What the OS can certainly do is ask Pluton to verify a signature before executing code, but the OS could also just verify that signature itself. Windows can already be configured to reject software that doesn't have a valid signature. If Microsoft wanted to enforce that they could just change the default today, there's no need to wait until everyone has hardware with Pluton built-in.

The two things that seem to cause people concerns are remote attestation and the fact that Microsoft will be able to ship firmware updates to Pluton via Windows Update. I've written about remote attestation before, so won't go into too many details here, but the short summary is that it's a mechanism that allows your system to prove to a remote site that it booted a specific set of code. What's important to note here is that the TPM (Pluton, in the scenario we're talking about) can't do this on its own - remote attestation can only be triggered with the aid of the operating system. Microsoft's Device Health Attestation is an example of remote attestation in action, and the technology definitely allows remote sites to refuse to grant you access unless you booted a specific set of software. But there are two important things to note here: first, remote attestation cannot prevent you from booting whatever software you want, and second, as evidenced by Microsoft already having a remote attestation product, you don't need Pluton to do this! Remote attestation has been possible since TPMs started shipping over two decades ago.

The other concern is Microsoft having control over the firmware updates. The context here is that TPMs are not magically free of bugs, and sometimes these can have security consequences. One example is Infineon TPMs producing weak RSA keys, a vulnerability that could be rectified by a firmware update to the TPM. Unfortunately these updates had to be issued by the device manufacturer rather than Infineon being able to do so directly. This meant users had to wait for their vendor to get around to shipping an update, something that might not happen at all if the machine was sufficiently old. From a security perspective, being able to ship firmware updates for the TPM without them having to go through the device manufacturer is a huge win.

Microsoft's obviously in a position to ship a firmware update that modifies the TPM's behaviour - there would be no technical barrier to them shipping code that resulted in the TPM just handing out your disk encryption secret on demand. But Microsoft already control the operating system, so they already have your disk encryption secret. There's no need for them to backdoor the TPM to give them something that the TPM's happy to give them anyway. If you don't trust Microsoft then you probably shouldn't be running Windows, and if you're not running Windows Microsoft can't update the firmware on your TPM.

So, as of now, Pluton running firmware that makes it look like a TPM just isn't a terribly interesting change to where we are already. It can't block you running software (either apps or operating systems). It doesn't enable any new privacy concerns. There's no mechanism for Microsoft to forcibly push updates to it if you're not running Windows.

Could this change in future? Potentially. Microsoft mention another use-case for Pluton "as a security processor used for non-TPM scenarios like platform resiliency", but don't go into any more detail. At this point, we don't know the full set of capabilities that Pluton has. Can it DMA? Could it play a role in firmware authentication? There are scenarios where, in theory, a component such as Pluton could be used in ways that would make it more difficult to run arbitrary code. It would be reassuring to hear more about what the non-TPM scenarios are expected to look like and what capabilities Pluton actually has.

But let's not lose sight of something more fundamental here. If Microsoft wanted to block free operating systems from new hardware, they could simply mandate that vendors remove the ability to disable secure boot or modify the key databases. If Microsoft wanted to prevent users from being able to run arbitrary applications, they could just ship an update to Windows that enforced signing requirements. If they want to be hostile to free software, they don't need Pluton to do it.

(Edit: it's been pointed out that I kind of gloss over the fact that remote attestation is a potential threat to free software, as it theoretically allows sites to block access based on which OS you're running. There's various reasons I don't think this is realistic - one is that there's just way too much variability in measurements for it to be practical to write a policy that's strict enough to offer useful guarantees without also blocking a number of legitimate users, and the other is that you can just pass the request through to a machine that is running the appropriate software and have it attest for you. The fact that nobody has actually bothered to use remote attestation for this purpose even though most consumer systems already ship with TPMs suggests that people generally agree with me on that)

comments

31 December 2021

Matthew Garrett: Update on Linux hibernation support when lockdown is enabled

Some time back I wrote up a description of my proposed (and implemented) solution for making hibernation work under Linux even within the bounds of the integrity model. It's been a while, so here's an update.

The first is that localities just aren't an option. It turns out that they're optional in the spec, and TPMs are entirely permitted to say they don't support them. The only time they're likely to work is on platforms that support DRTM implementations like TXT. Most consumer hardware doesn't fall into that category, so we don't get to use that solution. Unfortunate, but, well.

The second is that I'd ignored an attack vector. If the kernel is configured to restrict access to PCR 23, then yes, an attacker is never able to modify PCR 23 to be in the same state it would be if hibernation were occurring and the key certification data will fail to validate. Unfortunately, an attacker could simply boot into an older kernel that didn't implement the PCR 23 restriction, and could fake things up there (yes, this is getting a bit convoluted, but the entire point here is to make this impossible rather than just awkward). Once PCR 23 was in the correct state, they would then be able to write out a new swap image, boot into a new kernel that supported the secure hibernation solution, and have that resume successfully in the (incorrect) belief that the image was written out in a secure environment.

This felt like an awkward problem to fix. We need to be able to distinguish between the kernel having modified the PCRs and userland having modified the PCRs, and we need to be able to do this without modifying any kernels that have already been released[1]. The normal approach to determining whether an event occurred in a specific phase of the boot process is to "cap" the PCR - extend it with a known value that indicates a transition between stages of the boot process. Any events that occur before the cap event must have occurred in the previous stage of boot, and since the final PCR value depends on the order of measurements and not just the contents of those measurements, if a PCR is capped before userland runs, userland can't fake the same PCR value afterwards. If Linux capped a PCR before userland started running, we'd be able to place a measurement there before the cap occurred and then prove that that extension occurred before userland had the opportunity to interfere. We could simply place a statement that the kernel supported the PCR 23 restrictions there, and we'd be fine.

Unfortunately Linux doesn't currently do this, and adding support for doing so doesn't fix the problem - if an attacker boots a kernel that doesn't cap a PCR, they can just cap it themselves from userland. So, we're faced with the same problem: booting an older kernel allows the system to be placed in an identical state to the current kernel, and a fake hibernation image can be written out. Solving this required a PCR that was being modified after kernel code was running, but before userland was started, even with existing kernels.

Thankfully, there is one! PCR 5 is defined as containing measurements related to boot management configuration and data. One of the measurements it contains is the result of the UEFI ExitBootServices() call. ExitBootServices() is called at the transition from the UEFI boot environment to the running OS, and the kernel contains code that executes before it. So, if we measure an assertion regarding whether or not we support restricted access to PCR 23 into PCR 5 before we call ExitBootServices(), this will prevent userspace from spoofing us (because userspace will only be able to extend PCR 5 after the firmware extended PCR 5 in response to ExitBootServices() being called). Obviously this depends on the firmware actually performing the PCR 5 extension when ExitBootServices() is called, but if firmware's out of spec then I don't think there's any real expectation of it being secure enough for any of this to buy you anything anyway.

My current tree is here, but there's a couple of things I want to do before submitting it, including ensuring that the key material is wiped from RAM after use (otherwise it could potentially be scraped out and used to generate another image afterwards) and, uh, actually making sure this works (I no longer have the machine I was previously using for testing, and switching my other dev machine over to TPM 2 firmware is proving troublesome, so I need to pull another machine out of the stack and reimage it).

[1] The linear nature of time makes feature development much more frustrating

comments

30 November 2021

Russell Coker: Links November 2021

The Guardian has an amusing article by Sophie Elmhirst about Libertarians buying a cruise ship to make a seasteading project off the coast of Panama [1]. It turns out that you need permits etc to do this and maintaining a ship is expensive. Also you wouldn t want to mine cryptocurrency in a ship cabin as most cabins are small and don t have enough airconditioning to remain pleasant if you dump 1kW or more into the air. NPR has an interesting article about the reaction of the NRA to the Columbine shootings [2]. Seems that some NRA person isn t a total asshole and is sharing their private information, maybe they are dying and are worried about going to hell. David Brin wrote an insightful blog post about the singleton hypothesis where he covers some of the evidence of autocratic societies failing [3]. I think he makes a convincing point about a single centralised government for human society not being viable. But something like the EU on a world wide scale could work well. Ken Shirriff wrote an interesting blog post about reverse engineering the Yamaha DX7 synthesiser [4]. The New York Times has an interesting article about a Baboon troop that became less aggressive after the alpha males all died at once from tuberculosis [5]. They established a new more peaceful culture that has outlived the beta males who avoided tuberculosis. The Guardian has an interesting article about how sequencing the genomes of the entire population can save healthcare costs while improving the health of the population [6]. This is somthing wealthy countries should offer for free to the world population. At a bit under $1000 per test that s only about $7 trillion to test everyone, and of course the price should drop significantly if there were billions of tests being done. The Strategy Bridge has an interesting article about SciFi books that have useful portrayals of military strategy [7]. The co-author is Major General Mick Ryan of the Australian Army which is noteworthy as Major General is the second highest rank in use by the Australian Army at this time. Vice has an interesting article about the co-evolution of penises and vaginas and how a lot of that evolution is based on avoiding impregnation from rape [8]. Cory Doctorow wrote an insightful Medium article about the way that governments could force interoperability through purchasing power [9]. Cory Doctorow wrote an insightful article for Locus Magazine about imagining life after capitalism and how capitalism might be replaced [10]. We need a Star Trek future! Arstechnica has an informative article about new developmenet in the rowhammer category of security attacks on DRAM [11]. It seems that DDR4 with ECC is the best current mitigation technique and that DDR3 with ECC is harder to attack than non-ECC RAM. So the thing to do is use ECC on all workstations and avoid doing security critical things on laptops because they can t juse ECC RAM.

21 September 2021

Russell Coker: Links September 2021

Matthew Garrett wrote an interesting and insightful blog post about the license of software developed or co-developed by machine-learning systems [1]. One of his main points is that people in the FOSS community should aim for less copyright protection. The USENIX ATC 21/OSDI 21 Joint Keynote Address titled It s Time for Operating Systems to Rediscover Hardware has some inssightful points to make [2]. Timothy Roscoe makes some incendiaty points but backs them up with evidence. Is Linux really an OS? I recommend that everyone who s interested in OS design watch this lecture. Cory Doctorow wrote an interesting set of 6 articles about Disneyland, ride pricing, and crowd control [3]. He proposes some interesting ideas for reforming Disneyland. Benjamin Bratton wrote an insightful article about how philosophy failed in the pandemic [4]. He focuses on the Italian philosopher Giorgio Agamben who has a history of writing stupid articles that match Qanon talking points but with better language skills. Arstechnica has an interesting article about penetration testers extracting an encryption key from the bus used by the TPM on a laptop [5]. It s not a likely attack in the real world as most networks can be broken more easily by other methods. But it s still interesting to learn about how the technology works. The Portalist has an article about David Brin s Startide Rising series of novels and his thought s on the concept of Uplift (which he denies inventing) [6]. Jacobin has an insightful article titled You re Not Lazy But Your Boss Wants You to Think You Are [7]. Making people identify as lazy is bad for them and bad for getting them to do work. But this is the first time I ve seen it described as a facet of abusive capitalism. Jacobin has an insightful article about free public transport [8]. Apparently there are already many regions that have free public transport (Tallinn the Capital of Estonia being one example). Fare free public transport allows bus drivers to concentrate on driving not taking fares, removes the need for ticket inspectors, and generally provides a better service. It allows passengers to board buses and trams faster thus reducing traffic congestion and encourages more people to use public transport instead of driving and reduces road maintenance costs. Interesting research from Israel about bypassing facial ID [9]. Apparently they can make a set of 9 images that can pass for over 40% of the population. I didn t expect facial recognition to be an effective form of authentication, but I didn t expect it to be that bad. Edward Snowden wrote an insightful blog post about types of conspiracies [10]. Kevin Rudd wrote an informative article about Sky News in Australia [11]. We need to have a Royal Commission now before we have our own 6th Jan event. Steve from Big Mess O Wires wrote an informative blog post about USB-C and 4K 60Hz video [12]. Basically you can t have a single USB-C hub do 4K 60Hz video and be a USB 3.x hub unless you have compression software running on your PC (slow and only works on Windows), or have DisplayPort 1.4 or Thunderbolt (both not well supported). All of the options are not well documented on online store pages so lots of people will get unpleasant surprises when their deliveries arrive. Computers suck. Steinar H. Gunderson wrote an informative blog post about GaN technology for smaller power supplies [13]. A 65W USB-C PSU that fits the usual wall wart form factor is an interesting development.

13 July 2021

Matthew Garrett: Does free software benefit from ML models being derived works of training data?

Github recently announced Copilot, a machine learning system that makes suggestions for you when you're writing code. It's apparently trained on all public code hosted on Github, which means there's a lot of free software in its training set. Github assert that the output of Copilot belongs to the user, although they admit that it may occasionally produce output that is identical to content from the training set.

Unsurprisingly, this has led to a number of questions along the lines of "If Copilot embeds code that is identical to GPLed training data, is my code now GPLed?". This is extremely understandable, but the underlying issue is actually more general than that. Even code under permissive licenses like BSD requires retention of copyright notices and disclaimers, and failing to include them is just as much a copyright violation as incorporating GPLed code into a work and not abiding by the terms of the GPL is.

But free software licenses only have power to the extent that copyright permits them to. If your code isn't a derived work of GPLed material, you have no obligation to follow the terms of the GPL. Github clearly believe that Copilot's output doesn't count as a derived work as far as US copyright law goes, and as a result the licenses on the training data don't apply to the output. Some people have interpreted this as an attack on free software - Copilot may insert code that's either identical or extremely similar to GPLed code, and claim that there are no license obligations created as a result, effectively allowing the laundering of GPLed code into proprietary software.

I'm completely unqualified to hold a strong opinion on whether Github's legal position is justifiable or not, and right now I'm also not interested in thinking about it too much. What I think is more interesting is what the impact of either position has on free software. Do we benefit more from a future where the output of Copilot (or similar projects) is considered a derived work of the training data, or one where it isn't? Having been involved in a bunch of GPL enforcement activities, it's very easy to think of this as something that weakens the GPL and, as a result, weakens free software. That was my initial reaction, but that's shifted over the past few days.

Let's look at the GNU manifesto, specifically this section:

The fact that the easiest way to copy a program is from one neighbor to another, the fact that a program has both source code and object code which are distinct, and the fact that a program is used rather than read and enjoyed, combine to create a situation in which a person who enforces a copyright is harming society as a whole both materially and spiritually; in which a person should not do so regardless of whether the law enables him to.

The GPL makes use of copyright law to ensure that GPLed work can't be taken from the commons. Anyone who produces a derived work of GPLed code is obliged to provide that work under the same terms. If software weren't copyrightable, the GPL would have no power. But this is the outcome Stallman wanted! The GPL doesn't exist because copyright is good, it exists because software being copyrightable is what enables the concept of proprietary software in the first place.

The powers that the GPL uses to enforce sharing of code are used by the authors of proprietary software to reduce that sharing. They attempt to forbid us from examining their code to determine how it works - they argue that anyone who does so is tainted, unable to contribute similar code to free software projects in case they produce a derived work of the original. Broadly speaking, the further the definition of a derived work reaches, the greater the power of proprietary software authors. If Oracle's argument that APIs are copyrightable had prevailed, it would have been disastrous for free software. If the Apple look and feel suit had established that Microsoft infringed Apple's copyright, we might be living in a future where we had no free software desktop environments.

When we argue for an interpretation of copyright law that enhances the power of the GPL, we're also enhancing the power of giant corporations with a lot of lawyers on hand. So let's look at this another way. If Github's interpretation of copyright law holds, we can train a model on proprietary code and extract concepts without having to worry about being tainted. The proprietary code itself won't enter the commons, but the ideas it embodies will. No more worries about whether you're literally copying the code that implements an algorithm you want to duplicate - simply start typing and let the model remove the risk for you.

There's a reasonable counter argument about equality here. How much GPL-influenced code is going to end up in proprietary projects when compared to the reverse? It's not an easy question to answer, but we should bear in mind that the majority of public repositories on Github aren't under an open source license. Copilot is already claiming to give us access to the concepts embodied in those repositories. Do these provide more value than is given up? I honestly don't know how to measure that. But what I do know is that free software was founded in a belief that software shouldn't be constrained by copyright, and our default stance shouldn't be to argue against the idea that copyright is weaker than we imagined.

(Edit: this post by Julia Reda makes some of the same arguments, but spends some more time focusing on a legal analysis of why having copyright cover the output of Copilot would be a problem)

comments

4 June 2021

Matthew Garrett: Mike Lindell's Cyber "Evidence"

Mike Lindell, notable for absolutely nothing relevant in this field, today filed a lawsuit against a couple of voting machine manufacturers in response to them suing him for defamation after he claimed that they were covering up hacks that had altered the course of the US election. Paragraph 104 of his suit asserts that he has evidence of at least 20 documented hacks, including the number of votes that were changed. The citation is just a link to a video called Absolute 9-0, which claims to present sufficient evidence that the US supreme court will come to a 9-0 decision that the election was tampered with.

The claim is that Lindell was provided with a set of files on the 9th of January, and gave these to some cyber experts to verify. These experts identified them as packet captures. The video contains scrolling hex, and we are told that this is the raw encrypted data from the files. In reality, the hex values correspond very clearly to printable ASCII, and appear to just be the Pennsylvania voter roll. They're not encrypted, and they're not packet captures (they contain no packet headers).

20 of these packet captures were then selected and analysed, giving us the tables contained within Exhibit 12. The alleged source IPs appear to correspond to the networks the tables claim, and the latitude and longitude presumably just come from a geoip lookup of some sort (although clearly those values are far too precise to be accurate). But if we look at the target IPs, we find something interesting. Most of them resolve to the website for the county that was the nominal target (eg, 198.108.253.104 is www.deltacountymi.org). So, we're supposed to believe that in many cases, the county voting infrastructure was hosted on the county website.

Unfortunately we're not given the destination port, but 198.108.253.104 isn't listening on anything other than 80 and 443. We're told that the packet data is encrypted, so presumably it's over HTTPS. So, uh, how did they decrypt this to figure out how many votes were switched? If Mike's hackers have broken TLS, they really don't need to be dealing with this.

We're also given some background information on how it's impossible to reconstruct packet captures after the fact (untrue), or that modifying them would change their hashes (true, but in the absence of known good hash values that tells us nothing), but it's pretty clear that nothing we're shown actually demonstrates what we're told it does.

In summary: yes, any supreme court decision on this would be 9-0, just not the way he's hoping for.

Update: It was pointed out that this data appears to be part of a larger dataset. This one is even more dubious - it somehow has MAC addresses for both the source and destination (which is impossible), and almost none of these addresses are in actual issued ranges.

comments

2 June 2021

Matthew Garrett: Producing a trustworthy x86-based Linux appliance

Let's say you're building some form of appliance on top of general purpose x86 hardware. You want to be able to verify the software it's running hasn't been tampered with. What's the best approach with existing technology?

Let's split this into two separate problems. The first is to do as much as we can to ensure that the software can't be modified without our consent[1]. This requires that each component in the boot chain verify that the next component is legitimate. We call the first component in this chain the root of trust, and in the x86 world this is the system firmware[2]. This firmware is responsible for verifying the bootloader, and the easiest way to do this on x86 is to use UEFI Secure Boot. In this setup the firmware contains a set of trusted signing certificates and will only boot executables with a chain of trust to one of these certificates. Switching the system into setup mode from the firmware menu will allow you to remove the existing keys and install new ones.

(Note: You shouldn't use the trusted certificate directly for signing bootloaders - instead, the trusted certificate should be used to sign another certificate and the key for that certificate used to sign your bootloader. This way, if you ever need to revoke the signing certificate, you can simply sign a new one with the trusted parent and push out a revocation update instead of having to provision new keys)

But what do you want to sign? In the general purpose Linux world, we use an intermediate bootloader called Shim to bridge from the Microsoft signing authority to a distribution one. Shim then verifies the signature on grub, and grub in turn verifies the signature on the kernel. This is a large body of code that exists because of the use cases that general purpose distributions need to support - primarily, booting on arbitrary off the shelf hardware, and allowing arbitrary and complicated boot setups. This is unnecessary in the appliance case, where the hardware target can be well defined, where there's no need for interoperability with the Microsoft signing authority, and where the boot configuration can be extremely static.

We can skip all of this complexity using systemd-boot's unified Linux image support. This has the format described here, but the short version is that it's simply a kernel and initramfs linked into a small EFI executable that will run them. Instructions for generating such an image are here, and if you follow them you'll end up with a single static image that can be directly executed by the firmware. Signing this avoids dealing with a whole host of problems associated with relying on shim and grub, but note that you'll be embedding the initramfs as well. Again, this should be fine for appliance use-cases, but you'll need your build system to support building the initramfs at image creation time rather than relying on it being generated on the host.

At this point we have a single image that can be verified by the firmware and will get us to the point of a running kernel and initramfs. Unless you've got enough RAM that you can put your entire workload in the initramfs, you're going to want a filesystem as well, and you're going to want to verify that that filesystem hasn't been tampered with. The easiest approach to this is to use dm-verity, a device-mapper layer that uses a hash tree to verify that the filesystem contents haven't been modified. The kernel needs to know what the root hash is, so this can either be embedded into your initramfs image or into the kernel command line. Either way, it'll end up in the signed boot image, so nobody will be able to tamper with it.

It's important to note that a dm-verity partition is read-only - the kernel doesn't have the cryptographic secret that would be required to generate a new hash tree if the partition is modified. So if you require the ability to write data or logs anywhere, you'll need to add a new partition for that. If this partition is unencrypted, an attacker with access to the device will be able to put whatever they want on there. You should treat any data you read from there as untrusted, and ensure that it's validated before use (ie, don't just feed it to a random parser written in C and expect that everything's going to be ok). On the other hand, if it's encrypted, remember that you can't just put the encryption key in the boot image - an attacker with access to the device is going to be able to dump that and extract it. You'll probably want to use a TPM-sealed encryption secret, which will be discussed later on.

At this point everything in the boot process is cryptographically verified, and so should be difficult to tamper with. Unfortunately this isn't really sufficient - on x86 systems there's typically no verification of the integrity of the secure boot database. An attacker with physical access to the system could attach a programmer directly to the firmware flash and rewrite the secure boot database to include keys they control. They could then replace the boot image with one that they've signed, and the machine would happily boot code that the attacker controlled. We need to be able to demonstrate that the system booted using the correct secure boot keys, and the only way we can do that is to use the TPM.

I wrote an introduction to TPMs a while back. The important thing to know here is that the TPM contains a set of Platform Configuration Registers that are large enough to contain a cryptographic hash. During boot, each component of the boot process will generate a "measurement" of other security critical components, including the next component to be booted. These measurements are a representation of the data in question - they may simply be a hash of the object being measured, or the hash of a structure containing various pieces of metadata. Each measurement is passed to the TPM, along with the PCR it should be measured into. The TPM takes the new measurement, appends it to the existing value, and then stores the hash of this concatenated data in the PCR. This means that the final PCR value depends not only on the measurement, but also on every previous measurement. Without breaking the hash algorithm, there's no way to set the PCR to an arbitrary value. The hash values and some associated data are stored in a log that's kept in system RAM, which we'll come back to later.

Different PCRs store different pieces of information, but the one that's most interesting to us is PCR 7. Its use is documented in the TCG PC Client Platform Firmware Profile (section 3.3.4.8), but the short version is that the firmware will measure the secure boot keys that are used to boot the system. If the secure boot keys are altered (such as by an attacker flashing new ones), the PCR 7 value will change.

What can we do with this? There's a couple of choices. For devices that are online, we can perform remote attestation, a process where the device can provide a signed copy of the PCR values to another system. If the system also provides a copy of the TPM event log, the individual events in the log can be replayed in the same way that the TPM would use to calculate the PCR values, and then compared to the actual PCR values. If they match, that implies that the log values are correct, and we can then analyse individual log entries to make assumptions about system state. If a device has been tampered with, the PCR 7 values and associated log entries won't match the expected values, and we can detect the tampering.

If a device is offline, or if there's a need to permit local verification of the device state, we still have options. First, we can perform remote attestation to a local device. I demonstrated doing this over Bluetooth at LCA back in 2020. Alternatively, we can take advantage of other TPM features. TPMs can be configured to store secrets or keys in a way that renders them inaccessible unless a chosen set of PCRs have specific values. This is used in tpm2-totp, which uses a secret stored in the TPM to generate a TOTP value. If the same secret is enrolled in any standard TOTP app, the value generated by the machine can be compared to the value in the app. If they match, the PCR values the secret was sealed to are unmodified. If they don't, or if no numbers are generated at all, that demonstrates that PCR 7 is no longer the same value, and that the system has been tampered with.

Unfortunately, TOTP requires that both sides have possession of the same secret. This is fine when a user is making that association themselves, but works less well if you need some way to ship the secret on a machine and then separately ship the secret to a user. If the user can simply download the secret via some API, so can an attacker. If an attacker has the secret, they can modify the secure boot database and re-seal the secret to the new PCR 7 value. That means having to add some form of authentication, along with a strong binding of machine serial number to a user (in order to avoid someone with valid credentials simply downloading all the secrets).

Instead, we probably want some mechanism that uses asymmetric cryptography. A keypair can be generated on the TPM, which will refuse to release an unencrypted copy of the private key. The public key, however, can be exported and stored. If it's acceptable for a verification app to connect to the internet then the public key can simply be obtained that way - if not, a certificate can be issued to the key, and this exposed to the verifier via a QR code. The app then verifies that the certificate is signed by the vendor, and if so extracts the public key from that. The private key can have an associated policy that only permits its use when PCR 7 has an appropriate value, so the app then generates a nonce and asks the user to type that into the device. The device generates a signature over that nonce and displays that as a QR code. The app verifies the signature matches, and can then assert that PCR 7 has the expected value.

Once we can assert that PCR 7 has the expected value, we can assert that the system booted something signed by us and thus infer that the rest of the boot chain is also secure. But this is still dependent on the TPM obtaining trustworthy information, and unfortunately the bus that the TPM sits on isn't really terribly secure (TPM Genie is an example of an interposer for i2c-connected TPMs, but there's no reason an LPC one can't be constructed to attack the sort usually used on PCs). TPMs do support encrypted communication channels, but bootstrapping those isn't straightforward without firmware support. The easiest way around this is to make use of a firmware-based TPM, where the TPM is implemented in software running on an ancillary controller. Intel's solution is part of their Platform Trust Technology and runs on the Management Engine, AMD run it on the Platform Security Processor. In both cases it's not terribly feasible to intercept the communications, so we avoid this attack. The downside is that we're then placing more trust in components that are running much more code than a TPM would and which have a correspondingly larger attack surface. Which is preferable is going to depend on your threat model.

Most of this should be achievable using Yocto, which now has support for dm-verity built in. It's almost certainly going to be easier using this than trying to base on top of a general purpose distribution. I'd love to see this become a largely push button receive secure image process, so might take a go at that if I have some free time in the near future.

[1] Obviously technologies that can be used to ensure nobody other than me is able to modify the software on devices I own can also be used to ensure that nobody other than the manufacturer is able to modify the software on devices that they sell to third parties. There's no real technological solution to this problem, but we shouldn't allow the fact that a technology can be used in ways that are hostile to user freedom to cause us to reject that technology outright.
[2] This is slightly complicated due to the interactions with the Management Engine (on Intel) or the Platform Security Processor (on AMD). Here's a good writeup on the Intel side of things.

comments

6 May 2021

Matthew Garrett: More doorbell adventures

Back in my last post on this topic, I'd got shell on my doorbell but hadn't figured out why the HTTP callbacks weren't always firing. I still haven't, but I have learned some more things.

Doorbird sell a chime, a network connected device that is signalled by the doorbell when someone pushes a button. It costs about $150, which seems excessive, but would solve my problem (ie, that if someone pushes the doorbell and I'm not paying attention to my phone, I miss it entirely). But given a shell on the doorbell, how hard could it be to figure out how to mimic the behaviour of one?

Configuration for the doorbell is all stored under /mnt/flash, and there's a bunch of files prefixed 1000eyes that contain config (1000eyes is the German company that seems to be behind Doorbird). One of these was called 1000eyes.peripherals, which seemed like a good starting point. The initial contents were "Peripherals":[] , so it seemed likely that it was intended to be JSON. Unfortunately, since I had no access to any of the peripherals, I had no idea what the format was. I threw the main application into Ghidra and found a function that had debug statements referencing "initPeripherals and read a bunch of JSON keys out of the file, so I could simply look at the keys it referenced and write out a file based on that. I did so, and it didn't work - the app stubbornly refused to believe that there were any defined peripherals. The check that was failing was pcVar4 = strstr(local_50[0],PTR_s_"type":"_0007c980);, which made no sense, since I very definitely had a type key in there. And then I read it more closely. strstr() wasn't being asked to look for "type":, it was being asked to look for "type":". I'd left a space between the : and the opening " in the value, which meant it wasn't matching. The rest of the function seems to call an actual JSON parser, so I have no idea why it doesn't just use that for this part as well, but deleting the space and restarting the service meant it now believed I had a peripheral attached.

The mobile app that's used for configuring the doorbell now showed a device in the peripherals tab, but it had a weird corrupted name. Tapping it resulted in an error telling me that the device was unavailable, and on the doorbell itself generated a log message showing it was trying to reach a device with the hostname bha-04f0212c5cca and (unsurprisingly) failing. The hostname was being generated from the MAC address field in the peripherals file and was presumably supposed to be resolved using mDNS, but for now I just threw a static entry in /etc/hosts pointing at my Home Assistant device. That was enough to show that when I opened the app the doorbell was trying to call a CGI script called peripherals.cgi on my fake chime. When that failed, it called out to the cloud API to ask it to ask the chime[1] instead. Since the cloud was completely unaware of my fake device, this didn't work either. I hacked together a simple server using Python's HTTPServer and was able to return data (another block of JSON). This got me to the point where the app would now let me get to the chime config, but would then immediately exit. adb logcat showed a traceback in the app caused by a failed assertion due to a missing key in the JSON, so I ran the app through jadx, found the assertion and from there figured out what keys I needed. Once that was done, the app opened the config page just fine.

Unfortunately, though, I couldn't edit the config. Whenever I hit "save" the app would tell me that the peripheral wasn't responding. This was strange, since the doorbell wasn't even trying to hit my fake chime. It turned out that the app was making a CGI call to the doorbell, and the thread handling that call was segfaulting just after reading the peripheral config file. This suggested that the format of my JSON was probably wrong and that the doorbell was not handling that gracefully, but trying to figure out what the format should actually be didn't seem easy and none of my attempts improved things.

So, new approach. Rather than writing the config myself, why not let the doorbell do it? I should be able to use the genuine pairing process if I could mimic the chime sufficiently well. Hitting the "add" button in the app asked me for the username and password for the chime, so I typed in something random in the expected format (six characters followed by four zeroes) and a sufficiently long password and hit ok. A few seconds later it told me it couldn't find the device, which wasn't unexpected. What was a little more unexpected was that the log on the doorbell was showing it trying to hit another bha-prefixed hostname (and, obviously, failing). The hostname contains the MAC address, but I hadn't told the doorbell the MAC address of the chime, just its username. Some more digging showed that the doorbell was calling out to the cloud API, giving it the 6 character prefix from the username and getting a MAC address back. Doing the same myself revealed that there was a straightforward mapping from the prefix to the mac address - changing the final character from "a" to "b" incremented the MAC by one. It's actually just a base 26 encoding of the MAC, with aaaaaa translating to 00408C000000.

That explained how the hostname was being generated, and in return I was able to work backwards to figure out which username I should use to generate the hostname I was already using. Attempting to add it now resulted in the doorbell making another CGI call to my fake chime in order to query its feature set, and by mocking that up as well I was able to send back a file containing X-Intercom-Type, X-Intercom-TypeId and X-Intercom-Class fields that made the doorbell happy. I now had a valid JSON file, which cleared up a couple of mysteries. The corrupt name was because the name field isn't supposed to be ASCII - it's base64 encoded UTF16-BE. And the reason I hadn't been able to figure out the JSON format correctly was because it looked something like this:
"Peripherals":[] "prefix": "type":"DoorChime","name":"AEQAbwBvAHIAYwBoAGkAbQBlACAAVABlAHMAdA==","mac":"04f0212c5cca","user":"username","password":"password" ]

Note that there's a total of one [ in this file, but two ]s? Awesome. Anyway, I could now modify the config in the app and hit save, and the doorbell would then call out to my fake chime to push config to it. Weirdly, the association between the chime and a specific button on the doorbell is only stored on the chime, not on the doorbell. Further, hitting the doorbell didn't result in any more HTTP traffic to my fake chime. However, it did result in some broadcast UDP traffic being generated. Searching for the port number led me to the Doorbird LAN API and a complete description of the format and encryption mechanism in use. Argon2I is used to turn the first five characters of the chime's password (which is also stored on the doorbell itself) into a 256-bit key, and this is used with ChaCha20 to decrypt the payload. The payload then contains a six character field describing the device sending the event, and then another field describing the event itself. Some more scrappy Python and I could pick up these packets and decrypt them, which showed that they were being sent whenever any event occurred on the doorbell. This explained why there was no storage of the button/chime association on the doorbell itself - the doorbell sends packets for all events, and the chime is responsible for deciding whether to act on them or not.

On closer examination, it turns out that these packets aren't just sent if there's a configured chime. One is sent for each configured user, avoiding the need for a cloud round trip if your phone is on the same network as the doorbell at the time. There was literally no need for me to mimic the chime at all, suitable events were already being sent.

Still. There's a fair amount of WTFery here, ranging from the strstr() based JSON parsing, the invalid JSON, the symmetric encryption that uses device passwords as the key (requiring the doorbell to be aware of the chime's password) and the use of only the first five characters of the password as input to the KDF. It doesn't give me a great deal of confidence in the rest of the device's security, so I'm going to keep playing.

[1] This seems to be to handle the case where the chime isn't on the same network as the doorbell

comments

23 April 2021

Matthew Garrett: An accidental bootsplash

Back in 2005 we had Debconf in Helsinki. Earlier in the year I'd ended up invited to Canonical's Ubuntu Down Under event in Sydney, and one of the things we'd tried to design was a reasonable graphical boot environment that could also display status messages. The design constraints were awkward - we wanted it to be entirely in userland (so we didn't need to carry kernel patches), and we didn't want to rely on vesafb[1] (because at the time we needed to reinitialise graphics hardware from userland on suspend/resume[2], and vesa was not super compatible with that). Nothing currently met our requirements, but by the time we'd got to Helsinki there was a general understanding that Paul Sladen was going to implement this.

The Helsinki Debconf ended being an extremely strange event, involving me having to explain to Mark Shuttleworth what the physics of a bomb exploding on a bus were, many people being traumatised by the whole sauna situation, and the whole unfortunate water balloon incident, but it also involved Sladen spending a bunch of time trying to produce an SVG of a London bus as a D-Bus logo and not really writing our hypothetical userland bootsplash program, so on the last night, fueled by Koff that we'd bought by just collecting all the discarded empty bottles and returning them for the deposits, I started writing one.

I knew that Debian was already using graphics mode for installation despite having a textual installer, because they needed to deal with more complex fonts than VGA could manage. Digging into the code, I found that it used BOGL - a graphics library that made use of the VGA framebuffer to draw things. VGA had a pre-allocated memory range for the framebuffer[3], which meant the firmware probably wouldn't map anything else there any hitting those addresses probably wouldn't break anything. This seemed safe.

A few hours later, I had some code that could use BOGL to print status messages to the screen of a machine booted with vga16fb. I woke up some time later, somehow found myself in an airport, and while sitting at the departure gate[4] I spent a while staring at VGA documentation and worked out which magical calls I needed to make to have it behave roughly like a linear framebuffer. Shortly before I got on my flight back to the UK, I had something that could also draw a graphical picture.

Usplash shipped shortly afterwards. We hit various issues - vga16fb produced a 640x480 mode, and some laptops were not inclined to do that without a BIOS call first. 640x400 worked basically everywhere, but meant we had to redraw the art because circles don't work the same way if you change the resolution. My brief "UBUNTU BETA" artwork that was me literally writing "UBUNTU BETA" on an HP TC1100 shortly after I'd got the Wacom screen working did not go down well, and thankfully we had better artwork before release.

But 16 colours is somewhat limiting. SVGALib offered a way to get more colours and better resolution in userland, retaining our prerequisites. Unfortunately it relied on VM86, which doesn't exist in 64-bit mode on Intel systems. I ended up hacking the X.org x86emu into a thunk library that exposed the same API as LRMI, so we could run it without needing VM86. Shockingly, it worked - we had support for 256 colour bootsplashes in any supported resolution on 64 bit systems as well as 32 bit ones.

But by now it was obvious that the future was having the kernel manage graphics support, both in terms of native programming and in supporting suspend/resume. Plymouth is much more fully featured than Usplash ever was, but relies on functionality that simply didn't exist when we started this adventure. There's certainly an argument that we'd have been better off making reasonable kernel modesetting support happen faster, but at this point I had literally no idea how to write decent kernel code and everyone should be happy I kept this to userland.

Anyway. The moral of all of this is that sometimes history works out such that you write some software that a huge number of people run without any idea of who you are, and also that this can happen without you having any fucking idea what you're doing.

Write code. Do crimes.

[1] vesafb relied on either the bootloader or the early stage kernel performing a VBE call to set a mode, and then just drawing directly into that framebuffer. When we were doing GPU reinitialisation in userland we couldn't guarantee that we'd run before the kernel tried to draw stuff into that framebuffer, and there was a risk that that was mapped to something dangerous if the GPU hadn't been reprogrammed into the same state. It turns out that having GPU modesetting in the kernel is a Good Thing.

[2] ACPI didn't guarantee that the firmware would reinitialise the graphics hardware, and as a result most machines didn't. At this point Linux didn't have native support for initialising most graphics hardware, so we fell back to doing it from userland. VBEtool was a terrible hack I wrote to try to re-execute the system's graphics hardware through a range of mechanisms, and it worked in a surprising number of cases.

[3] As long as you were willing to deal with 640x480 in 16 colours

[4] Helsinki-Vantaan had astonishingly comfortable seating for time

comments

15 March 2021

Matthew Garrett: Exploring my doorbell

I've talked about my doorbell before, but started looking at it again this week because sometimes it simply doesn't send notifications to my Home Assistant setup - the push notifications appear on my phone, but the doorbell simply doesn't trigger the HTTP callback it's meant to[1]. This is obviously suboptimal, but it's also tricky to debug a device when you have no access to it.

Normally I'd just head straight in with a screwdriver, but the doorbell is shared with the other units in this building and it seemed a little anti-social to interfere with a shared resource. So I bought some broken units from ebay and pulled one of them apart. There's several boards inside, but one of them had a conveniently empty connector at the top with "TX", "RX" and "GND" labelled. Sticking a USB-serial converter on this gave me output from U-Boot, and then kernel output. Confirmation that my doorbell runs Linux, but unfortunately it didn't give me a shell prompt. My next approach would often me to just dump the flash and look for vulnerabilities that way, but this device uses TSOP-48 packaged NAND flash rather than the more convenient SPI NOR flash that I already have adapters to access. Dumping this sort of NAND isn't terribly hard, but the easiest way to do it involves desoldering it from the board and plugging it into something like a Flashcat USB adapter, and my soldering's not good enough to put it back on the board afterwards. So I wanted another approach.

U-Boot gave a short countdown to hit a key before continuing with boot, and for once hitting a key actually did something. Unfortunately it then prompted for a password, and giving the wrong one resulted in boot continuing[2]. In the past I've had good luck forcing U-Boot to drop to a prompt by simply connecting one of the data lines on SPI flash to ground while it's trying to read the kernel - the failed read causes U-Boot to error out. It turns out the same works fine on raw NAND, so I just edited the kernel boot arguments to append "init=/bin/sh" and soon I had a shell.

From here on, things were made easier by virtue of the device using the YAFFS filesystem. Unlike many flash filesystems, it's read/write, so I could make changes that would persist through to the running system. There was a convenient copy of telnetd included, but it segfaulted on startup, which reduced its usefulness. Fortunately there was also a copy of Netcat[3]. If you make a fifo somewhere on the filesystem, you can cat the fifo to a shell, pipe the shell to a netcat listener, and then pipe netcat's output back to the fifo. The shell's output all gets passed to whatever connects to netcat, and whatever's sent to netcat gets passed through the fifo back to the shell. This is, obviously, horribly insecure, but it was enough to get a root shell over the network on the running device.

The doorbell runs various bits of software, one of which is Lighttpd to provide a local API and access to the device. Another component ("nxp-client") connects to the vendor's cloud infrastructure and passes cloud commands back to the local webserver. This is where I found something strange. Lighttpd was refusing to start because its modules wanted library symbols that simply weren't present on the device. My best guess is that a firmware update went wrong and left the device in a partially upgraded state - and without a working local webserver, there was no way to perform any further updates. This may explain why this doorbell was sitting on ebay.

Anyway. Now that I had shell, I could simply dump the flash by copying it directly off the /dev/mtdblock devices - since I had netcat, I could just pipe stuff through that back to my actual computer. Now I had access to the filesystem I could extract that locally and start digging into it more deeply. One incredibly useful tool for this is qemu-user. qemu is a general purpose hardware emulation platform, usually used to emulate entire systems. But in qemu-user mode, it instead only emulates the CPU. When a piece of code tries to make a system call to access the kernel, qemu-user translates that to the appropriate calling convention for the host kernel and makes that call instead. Combined with binfmt_misc, you can configure a Linux system to be able to run Linux binaries from other architectures. One of the best things about this is that, because they're still using the host convention for making syscalls, you can run the host strace on them and see what they're doing.

What I found was that nxp-client was calling back to the cloud platform, setting up an encrypted communication channel (using ChaCha20 and a bunch of key setup stuff I couldn't be bothered picking apart) and then waiting for commands from the cloud. It would then proxy those through to the local webserver. Since I couldn't run the local lighttpd, I just wrote a trivial Python app using http.server and waited to see what requests I got. The first was a GET to a CGI script called editcgi.cgi, along with a path name. I mocked up the GET request to respond with what was on the actual filesystem. The cloud then proceeded to POST to editcgi.cgi, with the same pathname and with new file contents. editcgi.cgi is apparently able to read and write to files on the filesystem.

But this is on the interface that's exposed to the cloud client, so this didn't appear immediately useful - and, indeed, trying to hit the same CGI binary over the local network gave me a 401 unauthorized error. There's a local API spec for these doorbells, but they all refer to scripts in the bha-api namespace, and this script was in the plain cgi-bin namespace. But then I noticed that the bha-api namespace didn't actually exist in the filesystem - instead, lighttpd's mod_alias was configured to rewrite requests to bha-api through to files in cgi-bin. And by using the documented API to get a session token, I could call editcgi.cgi to read and write arbitrary files on the doorbell. Which means I can drop an extra script in /etc/rc.d/rc3.d and get a shell on my doorbell.

This all requires the ability to have local authentication credentials, so it's not a big security deal other than it allowing you to retain access to a monitoring device even after you've moved out and had your credentials revoked. I'm sure it's all fine.

[1] I can ping the doorbell from the Home Assistant machine, so it's not that the network is flaky
[2] The password appears to be hy9$gnhw0z6@ if anyone else ends up in this situation
[3] https://twitter.com/mjg59/status/654578208545751040

comments

9 March 2021

Matthew Garrett: Unauthenticated MQTT endpoints on Linksys Velop routers enable local DoS

(Edit: this is CVE-2021-1000002)

Linksys produces a series of wifi mesh routers under the Velop line. These routers use MQTT to send messages to each other for coordination purposes. In the version I tested against, there was zero authentication on this - anyone on the local network is able to connect to the MQTT interface on a router and send commands. As an example:
mosquitto_pub -h 192.168.1.1 -t "network/master/cmd/nodes_temporary_blacklist" -m ' "data": "client": "f8:16:54:43:e2:0c", "duration": "3600", "action": "start" '
will ask the router to block the client with MAC address f8:16:54:43:e2:0c from the network for an hour. Various other MQTT topics pass parameters to shell scripts without quoting them or escaping metacharacters, so more serious outcomes may be possible.

The vendor has released two firmware updates since report - I have not verified whether either fixes this, but the changelog does not indicate any security issues were addressed.

Timeline:

2020-07-30: Submitted through the vendor's security vulnerability report form, indicating that I plan to disclose in either 90 days or after a fix is released. The form turns out to file a Bugcrowd submission.
2020-07-30: I claim the Bugcrowd submission.
2020-08-19: Vendor acknowledges the issue, is able to reproduce and assigns it a P3 priority.
2020-12-15: I ask if there's an update.
2021-02-02: I ask if there's an update.
2021-02-03: Bugcrowd raise a blocker on the issue, asking the vendor to respond.
2021-02-17: I ask for permission to disclose.
2021-03-09: In the absence of any response from the vendor since 2020-08-19, I violate Bugcrowd disclosure policies and unilaterally disclose.

comments

Next.

Previous.